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A simple model for cooperation between "selfish" agents, which play an extended 
version of the Prisoner's Dilemma(PD) game, in which they use arbitrary payoffs, is 
CO ' presented and studied. A continuous variable, representing the probability of cooperation, 

Pk(t) G [0,1], is assigned to each agent k at time t. At each time step t a pair of agents, 
chosen at random, interact by playing the game. The players update their Pk{t) using a 
criteria based on the comparison of their utilities with the simplest estimate for expected 
£^ . income. The agents have no memory and use strategies not based on direct reciprocity 

^ ' nor 'tags'. Depending on the payoff matrix, the systems self-organizes - after a transient 

- into stationary states characterized by their average probability of cooperation p eq and 
CO ■ average equilibrium per-capita-income p eq , Uoa. It turns out that the model exhibit some 

' results that contradict the intuition. In particular, some games which - a priory- seems 

to favor defection most, may produce a relatively high degree of cooperation. Conversely, 
other games, which one would bet that lead to maximum cooperation, indeed are not the 
I optimal for producing cooperation. 

C , keybords: Complex adaptive systems, Agent-based models, Social systems 

^ ; PACS numbers: 02.50.Le, 87.23. Ge, 89.65.Gh, 89.75.-k 

o 

CO 

I. INTRODUCTION 

-1— » ' 
Ctf _ 

Game Theory constitutes a powerful and versatile approach to analyze the collective behavior of 
1 1 adaptive agents, from humans to bacteria and firms. In particular, the Prisoner's Dilemma (PD) game 

plays in Game Theory a role similar to the harmonic oscillator in Physics. It's been also referred to as 
the E. Coli of Social Sciences, allowing a very large variety of studies. Indeed, this game, developed in 
the early fifties, offers a very simple and intuitive approach to the problem of how cooperation emerges 
in societies of "selfish" individuals i.e. individuals which pursue exclusively their own self- benefit. It 
was used in a series of works by Robert Axelrod and co-workers [1] to examine the basis of cooperation 
between selfish agents in a wide variety of contexts. Furthermore, mechanisms of cooperation based on 
the PD have shown their usefulness in Political Science [2]- [4], Economics [5]- [11], International Affairs 
[12]- [15], Theoretical Biology [16]- [18] and Ecology [19]- [20]. 

The PD game consists in two players, say i and j, each confronting two choices: cooperate (C) or 
defect (D) and each makes its choice without knowing what the other will do. The four possible outcomes 
for the interaction of agent i with agent j are : 1) they can both cooperate (C,C) 2) both defect (D,D), 
3) i cooperates and j defects(C,D) and 4) i defects and j cooperates (D,C). Depending on the situation 
l)-4), the agent i (j) gets respectively : the "reward" R(R), the "punishment" P(P), the "sucker's payoff" 
S (the "temptation to defect" T) or T(S). These four payoffs obey the following chain of inequalities: 

T>R>P>S; (1) 

for instance the four canonical PD payoffs are: R = 3, S = 0, T = 5 and P — 1. Clearly it pays more 
to defect: if one of the two players defects -say i- , the other who cooperates will end up with nothing. 
In fact, even if agent i cooperates, agent j should defect, because in that case he will get T which is 
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larger than R. That is, independently of what the other player does, defection D yields a higher payoff 
than cooperation and is the dominant strategy for rational agents. This is equivalent to say, in a more 
technical language that, the outcome (D,D) of both players is the Nash equilibrium [21] of the PD game. 
The dilemma is that if both defect, both do worse than if both had cooperated: both players get P which 
is smaller than R. 

One can assign a payoff matrix M RSTP to the PD game given by 



which summarizes the payoffs for row actions when confronting with column actions. 

The emergence of cooperation in prisoner's dilemma (PD) games is generally assumed to require 
repeated play (and strategies such as Tit for Tat (TFT) [1] , involving memory of previous interactions) 
or features ("tags") permitting cooperators and defectors to distinguish one another [22]. 

In this work, I consider a simple model of selfish agents, possessing neither memory nor tags, to study 
the self-organized cooperative states which emerge when they play an extended PD game with arbitrary 
payoffs, i.e. payoffs which do not necessarily fulfill inequalities (1). The taxonomy of 2x2 games (one-shot 
games involving two players with two actions each) was constructed by Rapoport and Guyer [23], and 
showed that there exist exactly 78 non-equivalent games. 

There are N ag agents, with one variable assigned to each agent at the site or cell k and at time t: 
his probability of cooperation pk(t). Pairs of agents, i and j, interact by playing the PD game at each 
time step t. I use a Mean Field (MF) approach, in which all the spatial correlations in the system are 
neglected, and thus agent i and j are chosen at random. After playing the PD the players update their 
probability of cooperation pi (t) and pj (t) according to the same definite " measure of success" which does 
not vary with time. Thus all agents follow a universal and invariant strategy defined by a measure of 
success plus an updating rule to transform Pi(t) and Pj(t) into Pi(t + 1) and Pj(t + 1). 

After a transient, the system self-organizes into a state of equilibrium characterized by the average 
probability of cooperation p eq which depend on the payoff matrix. 

Payoff matrices can be classified into sub-categories according to their dominant strategy. Let us 
call Md the class of those matrices such that: 



for which the dominant strategy is D. This class comprises, for instance, the canonical matrix M 3051 and 
M 1053 , etc. A second class Mq corresponds to 



for which the dominant strategy is C, examples of this class are the matrices: M 5310 and M 3501 . The re- 
maining matrices do not comply with equation (2) or (3) and produce situations , a priori, not dominated 



One might wonder why bother to study matrices which imply no dilemma and are unrealistic in 
order to model the social behavior of the majority of individuals. There are several reasons. First, this 
" unreasonable" payoff matrices can be used by minorities of individuals which depart from the " normal" 
ones (assumed to be neutral). For instance, "antisocial" "always D" individuals, which cannot appreciate 
any advantage of cooperation, or "altruistic" "always C" individuals. Second, it seems interesting to 
test the robustness of cooperation under changes in the payoff matrix. In particular, we will se that 
even payoff matrices which imply a dilemma can produce either p eq = 0.5 or p eq = 0. Third, arbitrary 
payoff matrices could be also of importance in other contexts different from societies. One might envisage 
situations in which a definite value of p eq is required or is desirable in the design of a system or is the 




T> R, and P > S, 



(2) 



R>T, and 5> P, 



(3) 



by (D,D) or (C,C). 
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one which optimizes the functioning of a particular mechanism, etc. For example, to understand how 
a market of competing firms attains self-regulation. Or for instance in the traffic problem, where the 
damage suffered from mutual D (crash) exceeds the damage suffered by being exploited (turn away), 
which is more appropriately described by the so-called chicken game for which T > R > S > P. 

Fourth, we will show results for these payoff matrices which, at first glance, defy our intuition. For 
example, payoff matrices which, at least in principle, one would bet that favor defection and indeed 
produce a not so low degree of cooperation. 

II. A MECHANISM TO PRODUCE COOPERATIVE EQUILIBRIUM STATES 

Among the weaknesses of major approaches that have been considered to answer the question about 
the emergence of cooperation two are often remarked. The first criticism is about the over-simplification 
in the behavior of agents: they either always cooperate (C) or always defect (D). Clearly, this is not 
very realistic. Indeed, the levels of cooperation of the individuals seem to exhibit a continuous gamma 
of values. The second objection is concerning the deterministic nature of the algorithms which seem to 
fail to incorporate the stochastic component of human behavior. 

Both problems can be overcome by assigning to each agent k a probability of cooperation pk (t) (a 
real number in the interval [0,1]) instead of definite behaviors like C or D. Concerning the first objection, 
Pk{t) reflects the existence of a "gray scale" of levels of cooperation instead of just "black" and "white". 
Regarding the second objection, the proposed algorithm is clearly non deterministic: agent k plays C 
with probability pk and D with probability 1-pk- 

Now, let us describe the dynamics. The pairs of interacting partners, by virtue of the MF treatment, 
are chosen randomly instead of being restricted to some neighborhood. The implicit assumptions are 
that the population is sufficiently large and the system connectivity is high i.e. the agents display high 
mobility or they experiment interaction at a distance (for instance electronic transactions) . In this work 
the population of agents will be fixed to N ag — 1000 and the number of time steps will be of order 
tf = 10 5 — 10 6 in such a way that both assumptions be also consistent with the fact that agents have no 
memory 

Starting from an initial state at t = taken as Pk(0) chosen at random (in the interval [0,1]) for each 
agent fc, the system evolves by iteration during tf time steps following these procedure. 

• 1) Selection of players: Two agents, located at random positions i and j, are selected to interact 
i.e. to play the PD game. 

• 2) Playing pairwise PD: The behavior, C or D, of each player k ( k~i or k~j ) is decided generating 
a random number r and if pk(t) > r then he plays C and, conversely, if Pk(t) < r he plays D. 

• 3) Assessment of success: Each of the two players compares his utilities Uk(t), which is one of the 
four PD payoffs: R 7 S, T or P, with an estimate €k(t) of his expected utilities. If Uk{t) > tk{t) 
(Uk(t) < €k(t) ) the agent assumes he is doing well (badly) and therefore its level of cooperation is 
adequate (inadequate). 

• 4) Probability of cooperation update: If player k is doing well he keeps his probability of cooperation 
Pk(t). On the other hand, if player k is doing badly he decreases (increases) his probability of 
cooperation pk(t) if he played C (D) choosing an uniformly distributed value between Pk{t) and 1 
( between and Pk{t))- 

In order to introduce a simple and natural estimate €k(t) let us consider two players i and j which 
cooperate, at time t, with probabilities pi (t) and pj(t) respectively (and defect with probabilities 1— pi(t) 
and 1 — Pj(t) ), thus the expected utilities for the player i, U^ STP (t), are given by: 
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U PSTP (t) = R Pi (t) Pj (t) + S Pi (t)(l - Pj (t)) + T(1 - Pi (t)) Pj (t) + P(l- Pi (t))(l- Pj (t)), (4) 

while the expected utilities for the player j, U PSTP , are obtained by interchanging i and j in the above 
equation. 

This implies that, given the average probability of cooperation p(t), an arbitrary agent, say number 
k, can estimate his average expected utilities as: 

U k RSTP (p(t)) = Rp(t) Pk (t) + S Pk (l - p(t)) + T(l - Pk {t))p{t) + P(l - p(t))(l - p fc (t)). (5) 

However, it turns out that, in general, the value of p is unknown by the agents. Hence a simpler estimate 
that can be used agent k for his expected utilities e k {t) is obtained by replacing in equation (5) p(t) by 
his own probability of cooperation p k (t) : 

e PSTP (t) = Rpt(t) + (S + T) Pk (l - p k (t)) + P(l- Pk (t)) 2 

= (R-S-T + P)p 2 k (t) + (S + T- 2P) Pk {t) + P. (6) 

In other words, agent k adopts the simplest possible extrapolation i.e. that he is a "normal" individual 
whose probability of cooperation is representative of the average value 1 . 

The rule each player follows to update his probability of cooperation is quite natural and of the type 
"win-stay" and "lose-shift". That is, if the player's utilities U k are larger than his estimate he keeps his 
probability of cooperation. On the other hand, if the utilities are smaller than his estimate he changes 
his probability of cooperation: a) increasing it if he played D or b) decreasing it if he played C. From eq. 
(6) the update of p k (t) — ► p k (t + 1) is governed by the sign of U k {t) — e k STP (t) i.e. by the following 
inequations: 

/R\ 

(S + T-R-P)pl(t)-(S + T-2P) Pk (t)+ S \-P > 0; (7) 

w 

in the case > (< 0) p k is increased (decreased). 

In the next section we will see that the strategy which results from the combination of the proposed 
measure of success and update rule for p k -the steps 3) and 4)- , produces, for a wide variety of payoff 
matrices, cooperative states with p eq > 0. 

Let us end this section with a remark about the problem addressed here and its relation with the 
evolution of cooperation. In this approach, there is no competition of different strategies, all the agents 
follow the same universal strategy which does not evolve over time. However, the system is adaptive in 
the sense that the probabilities of cooperation of the agents do evolve. 



III. RESULTS 



Depending on the payoffs R, S, T and P the system self-organizes, after a transient, in equilibrium 
states with values of p eq ranging from to 1. The equilibrium asymptotic states can be lumped into 
3 groups according to the degre of cooperation attained: highly cooperative ( Peg > 0.5), moderately 



Considering more sophisticated agents, which have "good information" on the population (for instance the 
value of p at time t), does not change substantially the main results obtained with these naive agents. 
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cooperative (p eq ~ 0.5) and of poorly cooperative ( p eq < 0.5). The outcomes for any arbitrary payoff 
matrix M RSTP can be understood in terms of the updating rule for the cooperation probability and the 
corresponding estimate e RSTP i.e. from the inequalities (7). 

The payoff matrices which imply a dilemma -those which comply with the chain of inequalities (1)- 
lead either to p eq = ^ or to p eq — 0. From (7) it emerges that p eq = 0.5 occurs in the case when e RSTP — P 
has no roots in the interval (0,1] (p = is always one of the two roots) and p eq = in the opposite case. 

Some other matrices not belonging to class Mo exhibit a tension between C and D and give rise to 
p ~ 2 • The matrices which do not embody such trade-off produce the situations which depart most from 

Peq — 2" 

It is illustrative to consider, for a moment, the restricted subset of 24 payoff matrices obtained 
from permutation of the four canonical payoff values because it covers the three groups with different 
cooperation levels mentioned above. In fact, the system self-organizes into equilibrium states with seven 
values of p eq : 2 matrices (M 3501 and M 3510 ) produce p eq = 1,2 matrices (M 1053 and M 0153 ) produce 
p eq = 0. The remaining 20 matrices produce intermediate values: p eq ~ 0.72 (M 5301 ) , p eq ~ 0.62 
(M 3510 ), p eq ~ 0.38 (M 0135 ), p eq ~ 0.28 (M 1035 ) and p eq = 0.5 (the other 16 matrices and among them 
the canonical payoff matrix). The 24 measures are performed over 500 simulations. Fig. 1 show the 
average probability of cooperation for different payoff matrices vs. time for the 50,000 first time steps. 
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f 

FIG. 1. Curves of p vs. the number of iterations t, corresponding to the 24 payoff matrices obtained by 
permuting the four canonical payoffs R = 3, S = 0, T — 5 and P = 1. The system self-organizes in 7 different 
cooperative states with: p eq = 1, p eq ~ 0.72, p eq ~ 0.62, p eq ~ 0.5, p eq ^ 0.38, p eq ^ 0.28 and p eq ^ 



The mirror symmetry with respect to he value p = 0.5 between the curves for p(t) corresponding to a 
given matrix M RSTP and its palindrome M PTSR is due to the symmetry of the game when interchanging 
R <-> P and S <-» T simultaneously with cooperators C by defectors D. That is, 



p RSTP (t) = TT^p) PTSR {t). (8) 

A particular interesting case study is provided by payoff matrix M 0135 with p eq ~ 0.38. This result 
seems, at first sight, counter-intuitive: an intermediate cooperation level attained with reward (and 
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very low sucker's payoff) ! Nevertheless, let us show how the updating rule for the cooperation probability 
explains this outcome. The estimate for this matrix, given by the parabola 

4 135 =Pk~ 6 P k + 5, (9) 

plotted as a solid curve in Fig. 2 (the horizontal lines at S — 1 and T = 3 cut the parabola at abscises 
Ps = 3 — y/E and px = 3 — \fl, respectively). The cooperation update rule tells us the agent k increases 
his probability of cooperation when he plays D and gets T = 3 if pu is less than pr = 3 — \fl < 0.5, 
i.e. this temptation is not enough (T < £^ 5 {pk))- On the other hand, he decreases his probability of 
cooperation when he plays C and gets R = 0, independently of the value of pk, or when he gets S = 1 
if pk is less than ps — 3 — \/5 > 0.5. In the remaining situations the player keeps his probability of 
cooperation. Thus a value of p eq between and 0.5 is not surprising after all, rather it is the result of 
given the two competing probability of cooperation flows. All this analysis for payoff matrix M 0135 works 
also for any set of payoffs obeying the inequalities: 

P>T> S> R, (10) 

the only thing that changes is the value of p eq . We will come back over this particular payoff matrix to 
illustrate how p eq changes under arbitrary variations of the payoffs. 

The Effect of changing payoffs 

We are now going to analyze the effect of changing the payoff matrix in order to go beyond the 24 
permutations of the canonical payoffs. 

We have seen that the sign of Uk — £k controls the update of pk- From the definition of e^, as an 
estimate of utilities of agent k, it is clear that it is bounded from above and from below by the largest and 
smallest of the four payoffs, respectively. Thus, Uk — ek may have different signs, depending on the value 
of pk , only for the two intermediate payoffs. Let us denote by p\ the value of pk such that the estimate 
efc becomes equal to the larger payoff, P2 the value of pk such that the estimate becomes equal to the 
second larger payoff, and so on. Therefore, it is easy to see that the change in p eq is controlled by the 
displacements of p2 and p 3 (for instance, for M 0135 , p 2 = Pt and p 3 = ps). If "Pi or Vz correspond to the 
cooperative payoffs, R or S, then its displacement to the right (left) decreases (increases) the proportion 
of C-agents for whom Uk > £fc which are, on average, the ones who remain C after playing the game. 
This in turn decreases (increases) p eq . On the other hand, if p 2 or p 3 correspond to the non cooperative 
payoffs, T or P, then its displacement to the right (left) decreases (increases) the proportion of D-agents 
for whom Uk > £fc which are, on average, the ones who remain D after playing the game. This in turn 
increases (decreases) p eq . 

The payoff matrix M 0135 will serve to illustrate the effect the changes in the values of the payoffs have 
on p eq . We will proceed by modifying one of the four payoffs at a time and keeping fixed the remaining 
three in such a way that he chain of in-equalitics (10) is preserved. This variation of a quantity that 
results when the payoff X is modified and the other three payoffs remain fixed is denoted by Sx- The 
estimates that result from these changes are the curves shown in Fig. 2. Let us consider first the changes 
5s+, produced by an increment in the sucker's payoff from S = 1 to S = 2 (which transforms M 0135 into 
M 0235 ), (5 S -, produced by a decrease from S = 1 to S = (which transforms M 0135 into M 0035 ). For 
M 0235 , P3 is the abscise of the point e k = S = 2 (filled up triangle in Fig. 2(A)) and for M 0035 , p 3 is 
the abscise of the point = S = (filled down triangle in Fig. 2(A)), while the corresponding p 2 are 
the abscises of the points ek = T = 3 (filled triangles: up for M 0235 and down for M 0035 in Fig. 2 (A)). 
We can see that increasing (decreasing) the sucker's payoff, from S = 1 to S = 2 (S = 0), produces a 
displacement of p 2 to the right (left), from 3 - V7 ~ 0.354 to 0.4 (to 7 -^- ~ 0.314), and of p 3 to the left 
(right), from 3 — V5 ~ 0.764 to 0.6 (to 1). Hence, both changes point in the same direction increasing 
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(decreasing) p eq as can be observed in Fig. 3 (dotted lines vs. solid lines). In other words, 

5 S + (P2 - Ps) - (0.4 - 0.354) - (0.6 - 0.764) = 0.21 > 

5s- (P2 - Pa) - (0.314 - 0.354) - (1 - 0.764) = -0.276 < (11) 

Similarly, we denote by <5p+ the variations produced by an increment in the punishment, from P — 5 
to P = 6 (which transforms M 0135 into M 0136 ), and by 8p- the variations produced by a decrease in 
the punishment, from P = 5 to P = 4 (which transforms M 0135 into M 0134 ). For both matrices, the 
corresponding p 2 and p 3 are the abscises of the points = T = 3 and efe = S = 1 (non filled triangles 
in Fig. 2: up for M 0136 and down for M 0134 ), respectively. Also in Fig. 2(A) we see that changing the 
punishment, from P = 5 to P = 6 (P = 4), produces a displacement of p 2 to the right (left), from 
3 - V7 ~ 0.354 to ~ 0.419 (to 0.25) , and of p 3 to the right (left), from 3 - V5 ~ 0.764 to 

4 ~ 2 V ^ ~ 0.775 (to 0.75), hence the two changes point in opposite directions: the first tends to increase 
(decrease) p eq and the second to decrease (increase) it. As the first displacement is larger it dominates, 
and the net result is an increase (decrease) of p eq as can be observed in Fig. 3 (dot-dashed lines vs. solid 
line). That is: 

5p+ {P2 - Ps) ^ (0.419 - 0.354) - (0.775 - 0.764) = 0.054 > 0, 
5 P - {P2 - Ps) ^ (0.25 - 0.354) - (0.75 - 0.764) = -0.09 < 0. (12) 
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FIG. 2. (A) Below: The estimate e 0135 (p) vs. p (solid line) compared with the estimates that result for 
independent variations of payoffs S and P, once at a time: e 0035 (dotted thin line), e 0235 (dotted thick line), 
£ oi34 ^ot-clashed thin line) and e 0136 (dot-dashed thick line). The circles correspond to the points eoi35 = 1 
and eoi35 = 3. The filled up (down) triangles correspond to the points eo235 = 2 and 60235 = 3 (eoo35 = and 
£0035 = 3). The non-filled up (down) triangles correspond to the points eoi36 = 1 and eoi36 = 3 (eoi34 = 1 and 
£0134 = 3). (B) Above: The estimate e 0135 (p) vs. p (solid line) compared with the estimates that result for 
independent variations of payoffs T and 7?, once at a time: e 0125 (dashed thin line), e 0145 (dashed thick line) and 
e (+'s). The circles correspond to the points £0135 = 1 and e 0135 = 3. The filled up triangles correspond to 
the points e 1135 = 1 and e 1135 = 3. The non-filled up (down) triangles correspond to the points e 0145 = 1 and 

,0145 = 4 (e 0125 = 1 and £ 0125 = 2) ^ 



On the other hand, let us consider the variations produced by the increment of the temptation 5 T + , 
from T = 3 to T = 4 (which transforms M 0135 into M 0145 ), and by its decrease S T - , from T = 3 to T = 2 
(which transforms M 0135 into M 0125 ). For M 0145 , p 2 is the abscise corresponding to the point e& = T = 4 
(non- filled up triangle) and for M 0125 , p 2 is the abscise of the point €k = T = 2 (non- filled down triangle), 
while the corresponding p 3 are the abscises of the points €k = S = 1 (non-filled triangles: up for M 0145 
and down for M 0125 ). In Fig. 2(B) we can see that increasing (decreasing) the sucker's payoff, from T = 3 
to T = 4 (T = 2), produces a displacement of p 2 to the left (right), from 3 - \fl ~ 0.354 to 0.2 (to 0.5) 

and of p 3 to the right (left), from 3-VE~ 0.764 to 0.8 (to 7 -^- ~ 0.719). Hence both changes point in 
the same direction decreasing (increasing) p eq as can be observed in Fig. 3 (dashed lines vs. solid line). 
That is: 

5 T +(p2~p 3 ) - (0.2 - 0.354) - (0.8 - 0.764) = -0.19 < 

St- (P2 - Pa) - (0.5 - 0.354) - (0.719 - 0.764) = 0.19 > (13) 

With a similar argument one realizes that increasing (decreasing) the reward R = p eq decreases (in- 
creases). 

In summary, for payoff matrices like M 0135 , which obbey the chain of in-equalities (10), we found 
two expected results: a higher value of p eq can be reached by increasing the sucker's payoff S (which 
makes C-agcnts more altruistic) or decreasing the temptation T (reducing the incentives to free ride). 
Additionally we found two, a priori, unexpected results: a higher value of p eq can also be reached by 
increasing the punishment P or decreasing the reward R. By an inspection of Fig. 2(A) the effect of an 
increment of P can be understood as rising the expectations of the D-agcnts which in turn diminishes 
the fraction of agents that are satisfied after playing the game. Similarly, from Fig. 2(B) we can see that 
an decrease of R makes the C-agents less ambitious and increase the fraction of altruists. 
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FIG. 3. The effect of independent variations of payoffs S, T and P (the reward remains fixed at R — 0), once at 
a time, from payoff matrix M 0135 . Variations of S: S — 2 (dotted thick line), S = (dotted thin line). Variations 
of P: P = 6 (dot-dashed thick line), P — 4 (dot-dashed thin line). Variations of T: T — 4 (dashed thick line), 
T — 2 (dashed thin line). Variations of R: R = l ('+')• 



It is worth remarking that, for the case of payoffs obeying (10), something which at first seems as 
innocent as to interchange the two non cooperative payoffs T and P has a dramatic consequence: it 
transforms a system with an intermediate level of cooperation into one with null cooperation. This can 
be understood by comparing the estimate (9) for payoff matrix M 0135 to the one for M 0153 which is given 

by 

e ° fc 153 = -3^ + 3. (14) 

Both estimates have maximum value of P (5 and 3, respectively, at pk = 0) but the important difference 
is that in the first case P is the maximum payoff while in second one P < T . Thus in this second case, 
only the agents which play C can do badly, and then the only possible change for pk (according to its 
update rule) is a reduction till it reaches zero value. 

Finally, let us include a note regarding the efficiency to attain cooperative regimes. The state of 
maximum cooperation p eq — 1 is reached for payoff matrices such that S > R > max{T, P} plus the 
condition that equation 

(S + T-R- P)p 2 -{S + T- 2P)p + R-P = 0, (15) 

has no roots in the interval [0,1] different from p = 1 (which is always a root of (15)). This condition on 
the roots is because in the opposite case, when there is a root p x in-between and 1, it follows easily 
from the inequations (7) that p converges to the semi-sum of p x and 1. It can be checked by elemental 
algebra that this is the case of, for instance, payoff matrices M 3501 , M 3510 . 

IV. CONCLUSIONS 

The success of the strategy to attain cooperative regimes for a wide variety of games (payoff matrices) 
- mainly those which implies dilemmas or clearly favor D - relies on the combination of the proposed 
measure of success and update rule for the probability of cooperation. Basically it works by tuning the 
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agent's cooperation guided by a trade-off between efficiency (increase of utilities) and equity (indirect 
reciprocity). If the agent is doing well he maintains his probability of cooperation otherwise he changes 
it. When he is doing badly playing D he becomes more cooperative, i.e. he increases his probability 
of cooperation attempting to change to behavior C and explore this alternative behavior. Conversely, 
if he is doing badly playing C then he decreases his probability of cooperation attempting to change to 
behavior D and see what happen. 

An interesting extension of the model would be to allow competition of different strategies to promote 
their evolution i.e. players which imitate the best-performing ones in such a way that lower scoring 
strategies decrease in number and the higher scoring increase. 

Another possibility would be to allow the use of distinct payoff matrices. For instance, individuals 
inclined to cooperate (defect) might be represented by agents using the payoff matrix M 5301 (M 1035 ) while 
"neutral" ordinary agents by those using the canonical payoff matrix M 3051 . This would make possible 
to study if mutants inclined to D can invade a group of neutral individuals or individuals inclined to C 
and drive out all cooperation. 

Here I considered a MF approximation which neglects all the spatial correlations. One virtue of this 
simplification is that it shows the model does not require that agents interact only with those within 
some geographical proximity in order to sustain cooperation. Playing with fixed neighbors is sometimes 
considered as an important ingredient to successfully maintain the cooperative regime [25], [26]. However, 
the quality of this MF approximation depends on the nature of the system one desires to model, and 
varies whether one deals with human societies, viruses [27] , cultures of bacteria [28] or market of providers 
of different products. In order to consider situations in which the effect of geographic closeness cannot 
be neglected, an alternative version of this model, might include a spatial PD game, in which individuals 
interact only (or mainly) with those within some geographical proximity. In that case, the study of spatial 
patterns seems an interesting issue to address. Work is in progress in that direction. 
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